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The most important aspects of computer vision are moving object detection 
(MOD) and tracking. Many signal-processing applications use regional 
image statistics. Compute-intensive video and image processing with low 
latency and high throughput is done with field programmable gate array 
(FPGA) image processing. Local image statistics are used for edge 
identification and filtering. The histogram of oriented gradients (HoG) 
algorithm extracts local shape characteristics by equalizing histograms. The 
objective of the work is to design the hardware chip of the algorithm and 
perform the simulation in the Xilinx ISE 14.7 simulation environment. The 
performance of the chip is evaluated in Modelsim 10.0 simulation software 
to check its feasibility. The performance of the chip design is estimated on 
Viretx-5 FPGA and compared with the MATLAB-2020 image processing 
tool-based response time. This form of tracking typically deals with 
identifying, anchoring, and tracking images and videos. A mask made from a 
cut-out of the object can then determine the plane's coordinates depending 
on its position. This type of object tracking is frequently utilized in the field 
of augmented reality (AR). The algorithm is most suited for object detection 
using hardware controllers in haze and foggy environments. 
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1. INTRODUCTION 


Video target tracking includes [1] the identification of moving targets, which is crucial. Effective 
video track requires good moving target detection. The technique of extracting moving objects from image 
sequences that are relatively obvious to the eye from the backdrop based on their features such as intensity, 
edge, texture, and so forth is known as moving target detection. Its goal is to identify and separate stationary 
background targets from moving foreground targets. Simply said, moving target recognition is applied to 
establish the location of a moving object and determine whether it has been spotted in a video series. 

There have been numerous moving target identification techniques, but the optical flow process, 
inter-frame differential method, background subtraction (BS), and various enhanced algorithms derived from 
them are the most significant. The optical flow approach is the most computationally intensive and has the 
highest hardware requirements, making it more challenging to attain the method's objective of real-time 
detection [2]. For background modeling and background updating, the BS method is substantially more 
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labor-intensive. To avoid being influenced by environmental changes like illumination and interferences, we 
must swiftly build the backdrop image and refresh the background in time. 

Modern intelligent systems make significant use of computer vision and video analysis tools [3]. 
Because cameras are simple to set up, use, and maintain, video-based systems may obtain a wider variability 
of needed data and are relatively economical. There is an imperative demand for automatic video-recognizing 
systems that can substitute human operatives to monitor the regions under surveillance given the vast number 
of video cameras that are put everywhere these days. Every object in intelligent systems that relies on video 
can be found and monitored by a good tracking system. A well-separated feature-based event detection 
model can be constructed after tracking results have been collected. 

The five types of object-tracking techniques include appearance-based, model-based, feature-based, 
mesh-based, contour-based, and hybrid techniques [4]. The model-created tracking techniques take advantage 
of the geometry of typical items in a scene that is known in advance. The issue of tracking partially obscured 
objects can be resolved thanks to the construction of parameterized object models. The dynamic model of 
video things is used to track associated sections that roughly match the 2-D shapes of the video objects using 
appearance-based methods. The tracking approach depends on data that the complete region provides. This 
information can include things like movement, colour, and texture. Complex deformation typically exceeds 
the capability of these methods. Contour-based approaches just follow the outline of an object rather than 
each pixel. To project the contour and subsequently modify it to the object observed in the following frame. 
Rather than using static photos like in traditional object detection, video object detection uses video data to 
find objects. Video surveillance and autonomous driving are two applications that have significantly 
influenced the development of video object detection [5]. The ImageNet large scale visual recognition 
challenge (ILSVRC2015) added video target identification as a new task in 2015. ILSVRC2015 has 
contributed to an increase in research on video object detection. 

Robotics and human-computer interaction are only two examples of the many industries that have 
used computer vision as a key application of smart embedded systems [6]. Unmanned vehicles, computerized 
traffic control, surveillance, living biological image analysis, and smart intelligent robots are just a few 
examples of applications where object tracking, a core part of computer vision, can be highly helpful. With 
the use of object tracking, moving objects in a video frame sequence can be tracked along their paths. Object 
tracking [7] requires intense processing to extract the needed information from large amounts of video input, 
like most computer vision jobs. High-speed object tracking methods are also required due to the real-time 
handling demands of certain computer vision and related applications [8]. Field programmable gate arrays 
(FPGAs) have emerged as desirable computation programs for complicated applications [9] for high-device 
performance and less power consumption demands as shrinking process advances have made it possible to fit 
more transistors onto a single silicon chip. They offer great adaptability for porting programs to spatially 
parallel architectures due to their large number of programmable logic blocks, large number of memory 
modules, and high-performance digital signal processing (DSP) components [10]. 


2. RELATED WORK 

Images of outdoor sceneries frequently show fog, haze, mist, and other atmospheric degradation 
elements because air particles absorb light, which is then reflected by the source [11]. This effect influences 
how people see remote-sensing images. Histogram equalization, phase function consistency testing, and 
bilateral filters are all methods that utilize multi-retinex theory to reduce undesirable artifacts and improve 
the clarity of the result's visual appearance. The work concentrated on the suggested detection-based trailing 
system for fractures in ship assessment videos [12]. Using the ideal anchor programming and a 
postprocessing approach to get rid of terminated estimates, a customized RetinaNet model performs the 
detection stage. The enhanced channel and spatial reliability tracking (CSRT) and the novel data association 
algorithm are the two main parts of the tracking stage, which also maintains tracking indications for each 
trailing trajectory. The improved CSRT tracker expects the tracking information in the subsequent outlines 
by supporting an initial trailing target, and the innovative data relationship algorithm networks discoveries 
with the prevailing trackers. 

The two primary responsibilities of video surveillance systems [13] are BS and moving object 
detection (MOD). The existence of noise in the video sequence that was taken, however, is one of the main 
issues that seriously compromises the accuracy of detection. In that work, authors worked on the new MOD 
approach called De-Noising and moving object processing by lower-ranked approximation method from 
noisy video data. The suggested method produces accurate visual findings and measures values. The 
suggested solution was evaluated under many testing conditions, including shadow, inclement weather, 
camera jitter, and dynamic background. The original algorithm improves transmittance and maximizes 
distinctive light value [14] while resolving the issue that the dark channel previous technique causes the 
image colour to degrade. The image was switched from red, green, and blue (RGB) to hue saturation value 
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(HSV) space based on the restoration method for further improvement. The multiscale retinex with colour 
restoration (MSRCR) technique was used to improve the V component throughout the process of 
enhancement, and the adaptive stretching strategy was used to improve the saturation. The simulation 
experiment findings demonstrate that when the traditional enhancement algorithm improves the image, the 
new approach addresses the drawbacks of noise amplification and edge blur. The authors proposed an 
innovative haze/fog removal method that splits a foggy image into high and low-frequency groups according 
to their operational information using tetrolet transformation and uses a residual frequency extractor based on 
dual dictionary learning to extract more residual image data [15]. Sharpening the tetrolet coefficients extracts 
more precise information while performing dark channel prior (DCP) operation on the lower-frequency 
section to improve more fog-free information. When the inverse converted image is combined with the 
remaining high-frequency image component, contrast-constrained adaptive histogram equalization is used in 
post-processing to equalize the balance of contrast. 

An efficient method for removing haze from images is suggested by the authors, and it is based on 
multiexposure image fusion and better colour channel transfer [16]. A colour channel transport procedure 
based on k-means methods is used as part of the initial preprocessing of the image. A series of multiexposure 
images are then created using gamma correction, which is introduced based on guided filtering, and they are 
combined into a dehazed image using a Laplacian pyramid fusion strategy based on the local connection of 
adaptive processing of weights. The image is then dehazed before receiving contrast and saturation 
improvements. The authors suggested a novel haze removal method that combines the use of the anticipated 
hybrid DCP module, the anticipated colour analysis (CA) module [17], and the anticipated visibility recovery 
(VR) module to prevent the formation of significant artifacts. Section III goes into further detail about these 
modules. When the collected road image has localized color-shifter light source issues, the suggested 
technique can effectively block out those sources of light and prevent the formation of colour shifts. The 
suggested procedure can more successfully eliminate haze from individual photographs taken in practical 
situations than existing state-of-the-art systems, according to subsequent quantitative, experimental, and 
qualitative evaluations. 

The multi-resolution wavelet pyramid is built using the raising wavelet multiple determination 
technique. The issue of targets being out of alignment was resolved by an improved L-K algorithm [18]. 
Furthermore, the speeded up robust features (SURF) feature viewpoint fitting technique was combined at the 
same time. By using a multi-resolution oriented wavelet pyramid optical flow technique to decrease the 
likelihood of the exterior point based on the detection of feature points, difficulties with high speed, object 
deformation, haze, fog, uneven illumination, and limited occlusion circumstances were resolved. A real- 
instantaneous changing target identification system for the detection of moving targets against static 
backgrounds using edge detection and inter-frame difference [19]. The enhanced algorithm fixes the three- 
edge degree of difference edge deletion and empty phenomenon issues. The lack of the conventional three- 
frame differential approach is highlighted. The enhanced three-frame differential algorithm detects moving 
targets with more comprehensive information when merged with the Canny edge-based detection algorithm. 
This novel algorithm effectively makes use of the three-edge-difference methods and background elimination 
method for strong performances. 

A brand-new automatic segmentation technique for video sequences is given that can extract 
moving objects [20]. The object tracker at the heart of this approach uses the Hausdorff distance to compare 
successive frames with the 2-D binary description of the item. The best match discovered reflects the amount 
of transformation the item has undertaken, and the pattern is revised in every framework to account for the 
replacement and formation of required changes. The preliminary model is generated repeatedly, and an 
innovative classical updated technique based on the idea of rearranging associated components permits quite 
substantial form modifications. The proposed approach is enhanced by a stationary background-removing 
filtering method. The analysis is the most common moving object compression strategy that has been 
presented recently, together with the trend of moving object compression [21]. The concepts and execution 
procedures of traditional moving object systems for compressions are first summarized in this study. The 
definitions of moving substances and their paths are then addressed concerning this. In third place, the 
endorsement measures for assessing the effectiveness and performance of compression processes are 
presented. Additionally, a few application scenarios are summarized to highlight future potential 
applications. 

Objects can be effectively grouped into multiple classes using clustering based on the center and 
formerly undiscovered methods present in the dataset [22]. As location-based placement technology 
advances, an enhancing number of moving points are tracked, and their paths are recorded. As a result, the 
learning affecting object data mining will surely center on moving object trajectory clustering. The data 
collection method for extremely erratic raw JoT-oriented sensor data proposed in this paper uses device-to- 
device communication [23]. When there are significant uncertainties at the fog server, the approach 
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iteratively locates the low-rank calculation of the dominating subspace after initially reconstructing the 
subspace using sample data. Moreover, the real sensor data background is estimated from the substantially 
erratic raw IoT-oriented sensor data stream of the traffic matrix using the resilient dominating subspace. The 
implementation was done for an object-tracking system with reconfigurable hardware [24] employing a 
productive parallel architecture. A BS-based approach is used in our implementation. To attain high system 
speed, the developed object tracker takes advantage of hardware parallelism. To improve our system's 
performance under challenging tracking situations, we additionally suggest a dual object region search 
strategy. The implemented hardware system used EP3SL340H1152C2 Altera Stratix III FPGA device. The 
software application operating on a 2.2 GHz processor is contrasted with the suggested FPGA-based 
implementation. For complicated visual inputs, the observed speedup can be as high as 100X. 

The Kalman filter [25] is the greatest popular expectation system because it is the simplest, most 
effective, and easiest to use for linear measurements. To meet design criteria for embedded applications, 
these kinds of filter procedures are, however, tailored to hardware platforms like FPGA and GPUs. Motion 
detection and object tracking are addressed in this work using the multi-dimensional Kalman filter (MDKF) 
technique. Compared to state-of-the-art trailing algorithms trained on standard targets, the suggested tracking 
algorithm's numerical analysis yields competitive tracking. FPGA has been used for an effective object- 
detecting algorithm and FPGA implementation for real-time video [26]. The system uses the fast retina key 
point (FREAK) approach to characterize the key points after detecting them using the SURF procedure on 
individual video frames. High object detection accuracy is ensured by doing a one-to-one feature 
corresponding between the signifiers of the library's items and the descriptors of the video frames. Our tests 
show that our FPGA-based system works flawlessly and can process video frames with an 800600 resolution 
at 60 frames per second. Algorithms developed using common benchmarks run 23 times quicker on the 
suggested FPGA configuration than they do on an Intel Core i5-3210M CPU. Additionally, the ZynqTM- 
7000 System-on-Chip (SoC) from Xilinx is used to implement the MDKF. A multi-class classifier for binary 
feature vectors was created by condensing the Naive Bayes classifier [27]. It operates swiftly and effectively 
during both the training and testing phases because it was constructed on an FPGA with relatively few 
hardware resources. It was first put to the test on a dataset of handwritten digital numbers before being used 
in the object detection task on a specific FPGA-oriented visual surveillance system. 

The object classification phases of an object recognition system are implemented by an image 
classifier utilizing an FPGA and random-access memory (RAM)-based distributed architecture [28]. 
Compared to current programmable DSP-based systems, the technology delivers a considerable performance 
boost. The study demonstrates how the presence of high I/O resources and pipelined architecture contributes 
to the significant performance gain achieved with the FPGA solution. It also serves as an example of how an 
FPGA solution can be used for activities with high data flow and intricate algorithmic requirements, such as 
real-time video processing. The RC1000-PP Virtex FPGA-based was implemented based on handle-C 
language. An innovative FPGA-based method for effective target recognition in hyperspectral pictures was 
created by the authors [29]. The Reed-Xiaoli (RX) and constrained energy minimization (CEM) algorithms 
are optimized using the suggested approach for streaming background statistics (SBS) methodology. The 
methods are popular methods for anomaly and target identification, respectively [30]. These two techniques 
are specifically implemented on FPGAs in a spilling mode. Most crucially, we offer a double approach that 
offers an adaptable datapath to choose in real-time between these two techniques, enabling the hardware to 
dynamically adapt to target detection or anomaly detection circumstances. 

The related work presented that the work has been done in the direction of the object images and 
video tracking performed using MATLAB and Python simulation environments with different sizes of 
images, filtering, and image processing techniques [31]. Very few works have been reported in which FPGA 
hardware has been used for high-performance object tracking algorithms [32] and estimating the performance 
of the algorithm with hardware design and switching point of view. The research work presents the algorithm 
in that direction with the simulation environment in Xilinx ISE 14.7. 


3. PROPOSED ALGORITHM 
There are numerous use cases for object tracking that use various types of input footage. The 

techniques used to form object tracking applications are affected by whether the estimated input will be a 

real-time video as opposed to a prerecorded video, an image, or both. The generic method for object 

detection and tracking is given in Figure 1. 

— Camera for images: real-time images/video streams from practically any camera can be used to apply 
modern object-tracking techniques. Consequently, object tracking can be done using the video stream 
from a USB camera or an IP camera bypassing the individual frames to a tracking algorithm. With real- 
time video inputs from one or more cameras, frame skipping, or parallelized processing are frequent 
techniques to enhance object tracking performance [33]. 
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Image/video pre-processing: image/video consists of a sequence of frames. The individual frame depicts 
a different state of an object's status. Object detection at the beginning of the frame and continued 
tracking of that specific object throughout the video sequence [34]. 

Object detection: the term "object detection" refers to a category of computer technology that searches 
movies and digital images for occurrences of semantic objects belonging to a particular class such as 
automobiles, buildings, and people. BS, optical flow, and frame differencing are some basic methods 
for object detection. 

Object localization: object localization is the process of identifying the type of object of interest in each 
detected object in the frame. It is necessary to determine what kind of object it is. Several elements, 
such as texture, colour, motion, and shape, can be used to identify an object. It might be shape-based, 
texture-based, color-based, or motion-based, depending on the variables put into play 

Object tracking: using successive image frames to monitor an object, object tracking is a technique for 
figuring out how an object moves concerning other things. The most common technique is to gauge 
how much the object's centroid has moved in (X, Y) between frames. The three methods of object 
tracking are point-oriented tracking, kernel-based tracking, and silhouette-based tracking [35]. 


Figure 1. Object detection and tracking 


A computer vision and image processing feature called the histogram of oriented gradients (HoG) is 


applied to identify objects. Using a detection window, or region of interest (RoI), the HoG descriptor method 
considers instances of gradient induction in focused areas of an image. The behaviour of the HoG is shown in 
Figure 2 in which the entire image is processed using a (4x4) mask. The HoG signifier algorithm 
implementation strategy [36] and methodology are shown in Figure 3. 


mi 
lul 


Before 


(16x8) 


Figure 2. HoG execution of the images 
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e Initially split the complete image into small cells, connected areas called cells, and create the histogram 
of the oriented gradient directions based on edge orientations for each cell pixels. 


e Each cell should be discretized into angular bins based on the assigned gradient orientation. 


e The matching angular bin receives weighted gradient contributions from each cell pixel. 


* Blocks are thought of as spatial subdivisions made up of contiguous cells. The foundation for grouping 
and normalising histograms is the division of cells into blocks. 


¢ The block histogram is a group of normalised histograms. This collection of block histograms serves as 
the description. 


e The following fundamental configuration settings are necessary for computing the HOG descriptor: 
Masks for computing gradients and derivatives, dividing an image into sevral cells and combining those 
cells into sceprate blocks using geometry, Overlapping blocks and standardisation variables. 


Figure 3. Methodology 


Consider the input image from which HoG characteristics must be calculated. Resize the image to 
128x64 pixels (128 pixels in height and 64 pixels in width). This dimension may apply to the type of 
detection required to provide better results in terms of object detection. It is necessary to compute the 
gradient of the pictures, which is obtained by combining the magnitude and angle from the image. For each 
pixel in a (4x4) block, G, and G, are first calculated. First, Gy and Gy are determined for each pixel value 
using the mathematical (1) and (2). 


Gy (a, Ca) = Im (fa Ca + 1) — Im a Ca — 1) (1) 
Gy (ra, Ca) = Im (ra — 1, Ca) — Im (ra + 1, Ca) (2) 


Here rg and C4 present the row and column data processing. After estimation, the values of G, and G, , the 
values of magnitude and phase are calculated using (3) and (4) respectively. 


Magnitude, M = /G? + G2 (3) 


Phase ,@ = [tan (>) 


X 


(4) 


After collecting the gradient of individual pixels, the gradient matrices with magnitude value and angle 
values matrix are grouped into 4x4 cells to make a block, and boundaries and centers are decided to estimate 
the feature vectors. 


4. RESULTS AND DISCUSSIONS 

The chip view of the HoG algorithm is shown in Figure 4. The description of all the pins utilized is 
given to understand the input and output signals of the design. Register transfer level (RTL) provides a 
relatively low degree of abstraction, which makes it possible to describe digital circuits without much 
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difficulty. The RTL consists of the clk signal and reset signal as the main inputs. The RTL is extracted from 
the Xilinx ISE simulation tool. 

Image_in_Histogram_Pixel_0<31:0> presents the pixel0 intensity input integer before processing in 
histogram equalization. Image_in_Histogram_Pixel_! <31:0> presents the pixell intensity input integer 
before processing in histogram equalization. Image_in_Histogram_Pixel_2 <31:0> presents the pixel2 
intensity input integer before processing in histogram equalization. Image_in_Histogram_Pixel_3 <31:0> 
presents the pixel3 intensity input integer before processing in histogram equalization. 
Image_in_Histogram_Pixel_4 <31:0> presents the pixel4 intensity input integer before processing in 
histogram equalization. Image_in_Histogram_Pixel_5 <31:0> presents the pixel5S intensity input integer 
before processing in histogram equalization. Image_in_Histogram_Pixel_6 <31:0> presents the pixel6 
intensity input integer before processing in histogram equalization. Image_in_Histogram_Pixel_7 <31:0> 
presents the pixel7 intensity input integer before processing in histogram equalization. 
Image_in_Histogram_Pixel_8 <31:0> presents the pixel8 intensity input integer before processing in 
histogram equalization. Image_in_Histogram_Pixel_9 <31:0> presents the pixel9 intensity input integer 
before processing in histogram equalization. 


Image_in_Histogram_Pixel_0<31:0> 


Image_in_Histogram_Pixel_1<31:0> 


Image_in_Histogram_Pixel_2<31:0> 


Image_in_Histogram_Pixel_3<31:0> 


Image_in_Histogram_Pixel_4<31:0> 


Image_in_Histogram_Pixel_5<31:0> 


Image_in_Histogram_Pixel_6<31:0> 


Image_in_Histogram_Pixel_7<31:0> 


Image_in_Histogram_Pixel_8<31:0> 


Image_in_Histogram_Pixel_9<31:0> 


Image_in_Histogram_Pixel_10<31:0> 


Image_in_Histogram_Pixel_11<31:0> 


Image_in_Histogram_Pixel_12<31:0> 


Image_in_Histogram_Pixel_13<31:0> 


Image_in_Histogram_Pixel_14<31:0> 


Image_in_Histogram_Pixel_15<31:0> 


Pixel_identification<31:0> 


clk 


reset 


Image_out_Histogram_Pixel_0<31:0> 


Image_out_Histogram_Pixel_1<31:0> 


Image_out_Histogram_Pixel_2<31:0> 


Image_out_Histogram_Pixel_3<31:0> 


Image_out_Histogram_Pixel_4<31:0> 


Image_out_Histogram_Pixel_5<31:0> 


Image_out_Histogram_Pixel_6<31:0> 


Image_out_Histogram_Pixel_7<31:0> 


Image_out_Histogram_Pixel_8<31:0> 


Image_out_Histogram_Pixel_9<31:0> 


Image_out_Histogram_Pixel_10<31:0> 


Image_out_Histogram_Pixel_11<31:0> 


Image_out_Histogram_Pixel_12<31:0> 


Image_out_Histogram_Pixel_13<31:0> 


Image_out_Histogram_Pixel_14<31:0> 


Image_out_Histogram_Pixel_15<31:0> 


Figure 4. RTL of the HoG chip design 


Image_in_Histogram_Pixel_10 <31:0> presents the pixel10 intensity input integer before processing 
in histogram equalization. Image_in_Histogram_Pixel_11 <31:0> presents the pixell 1 intensity input integer 
before processing in histogram equalization. Image_in_Histogram_Pixel_12 <31:0> presents the pixel12 
intensity input integer before processing in histogram equalization. Image_in_Histogram_Pixel_13 <31:0> 
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presents the pixell3 intensity input integer before processing in histogram equalization. 
Image_in_Histogram_Pixel_14 <31:0> presents the pixell4 intensity input integer before processing in 
histogram equalization. Image_in_Histogram_Pixel_15 <31:0> presents the pixell5 intensity input integer 
before processing in histogram equalization. 

Image_out_Histogram_Pixel_0 <31:0> presents the pixel0 intensity output integer after processing 
in histogram equalization. Image_out_Histogram_Pixel_1! <31:0> presents the pixel! intensity output integer 
after processing in histogram equalization. Image_out_Histogram_Pixel_2 <31:0> presents the pixel2 
intensity output integer after processing in histogram equalization. Image_out_Histogram_Pixel_3 <31:0> 
presents the pixel3 intensity output integer after processing in histogram equalization. 
Image_out_Histogram_Pixel_4 <31:0> presents the pixel4 intensity output integer after processing in 
histogram equalization. Image_out_Histogram_Pixel_5 <31:0> presents the pixel5 intensity output integer 
after processing in histogram equalization. Image_out_Histogram_Pixel_6 <31:0> presents the pixel6 
intensity output integer after processing in histogram equalization. Image_out_Histogram_Pixel_7 <31:0> 
presents the pixel7 intensity output integer after processing in histogram equalization. 
Image_out_Histogram_Pixel_8 <31:0> presents the pixel8 intensity output integer after processing in 
histogram equalization. Image_out_Histogram_Pixel_9 <31:0> presents the pixel9 intensity output integer 
after processing in histogram equalization. Image_out_Histogram_Pixel_10 <31:0> presents the pixell0 
intensity output integer after processing in histogram equalization. Image_out_Histogram_Pixel_11 <31:0> 
presents the pixelll intensity output integer after processing in histogram equalization. 
Image_out_Histogram_Pixel_12 <31:0> presents the pixell2 intensity output integer after processing in 
histogram equalization. Image_out_Histogram_Pixel_13 <31:0> presents the pixel13 intensity output integer 
after processing in histogram equalization. Image_out_Histogram_Pixel_14 <31:0> presents the pixell4 
intensity output integer after processing in histogram equalization. Image_out_Histogram_Pixel_15 <31:0> 
presents the pixell5 intensity output integer after processing in histogram equalization. Clock is the input 
given to assign the positive edge of the clock signal and reset will provide the reset of all the pixel values. 
Figure 5 presents the Modelsim simulation of HoG for object tracking for test-1 and test-2 in integer real 
value of pixels. Figure 6 shows the Modelsim simulation of HoG for object tracking for test-1 and test-2 in 
binary value of pixels. Table 1 presents the lists of test-1 (Image_in_Histogram_Pixel_ and 
Image_out_Histogram_Pixel_) and Table 2 presents the lists of test-2 (Image_in_Histogram_Pixel_ and 
Image_out_Histogram_Pixel_). 


/image_object_tracking/clk 
/image_object_tracking/reset 
fimage_object_tracking/image_in_histogram_pixel_O 
/image_object_tracking/image_in_histogram_pixel_1 
éimage_object_tracking/image_in_histogram_pixel_2 
/image_object_tracking/image_in_histogram_pixel_3 
éimage_object_tracking/image_in_histogram_pixel_4 
éimage_object_tracking/image_in_histogram_pixel_5 
/image_object_tracking/image_in_histogram_pixel_6 
éimage_object_tracking/image_in_histogram_pixel_? 
fimage_object_tracking/image_in_histogram_pixel_8 
/image_object_tracking/image_in_histogram_pixel_9 
fimage_object_tracking/image_in_histogram_pixel_10 
/image_object_tracking/image_in_histogram_pixel_11 
éimage_object_tracking/image_in_histogram_pixel_12 
/image_object_tracking/image_in_histogram_pixel_13 
fimage_object_tracking/image_in_histogram_pixel_14 
/image_object_tracking/image_in_histogram_pixel_15 
éimage_object_tracking/image_out_histogram_pixel_O 
fimage_object_tracking/image_out_histogram_pixel_1 
/immage_object_tracking/image_out_histogram_pixel_2 
éimage_object_tracking/image_out_histogram_pixel_3 
/image_object_tracking/image_out_histogram_pixel_4 
/image_object_tracking/image_out_histogram_pixel_5 
fimage_object_tracking/image_out_histogram_pixel_6 
/image_object_tracking/image_out_histogram_pixel_? 
éimage_object_tracking/image_out_histogram_pixel_8 
/immage_object_tracking/image_out_histogram_pixel_9 
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Figure 5. Modelsim simulation of HoG for object tracking for test-1 and test-2 in integer real value of pixels 
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Figure 6. Modelsim simulation of HoG for object tracking for test-1 and test-2 in binary value of pixels 


Table 1. Test-1 


Table 2. Test-2 


Image_in_Histogram_Pixel Image_out_Histogram_Pixel_ 


Image_in_Histogram_Pixel Image_out_Histogram_Pixel 


0=4 0=5 0= 0=0 
1=1 1=3 1= 1=0 
2= 2=4 2= 2=0 
3=2 3=4 3=4 3=4 
4= 4=4 4=1 4=2 
5=1 5=3 5=1 5=2 
6=1 6=3 6=1 6=2 
7=1 7=3 7=5 7=5 
8=0 8=1 8=1 8=2 
9=1 9=3 9=1 9=3 
10= 10=5 10=2 10=3 
11= 11=4 11=7 11=7 
12=1 12=3 12=2 12=3 
12=1 12=3 12=2 12=3 
14= 14=4 14=2 14=3 
15=2 15= 15=7 15=7 


The response time of the image is analyzed in MATLAB and Xilinx ISE 14.7. Table 3 lists the 
description of the response time for these simulations. Table 4 presents the simulation outcome of the 


algorithm applied for the random images/videos taken from the author’s camera. 


Table 3. Comparison of the response time for detection 


Description Response time in MATLAB (seconds) 


Response time in Xilinx ISE in (nanoseconds) 


Object image/Video-1 
Object image/Video-2 
Object image/Video-3 
Object image/Video-4 
Object image/Video-5 


0.39 
0.45 
0.42 
0.67 
0.72 


0.672 
0.428 
0.512 
0.905 
1.005 
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Table 4. Simulation of the sampled image/video 
Original After algorithm 


S. No 
Image/Video-1 


Image/Video-2 


Image/Video-3 


Image/Video-4 


Image/Video-5 


5. CONCLUSIONS 

Real-time FPGA-based object tracking is frequently employed in a variety of applications, including 
video surveillance, human-computer interaction, traffic monitoring, and vehicle navigation. Various 
algorithms based on feature descriptors, optical flow, template matching, or texture operators are used. Most 
often, the algorithms on the FPGA track every moving object or just certain types of objects. The simulation 
of the HoG hardware chip is done successfully in Xilinx ISE 14.7 software which is used to identify objects. 
The behavior of the chip simulations is verified using Modelsim 10.0 for object detection. The maximum 
response time estimation on FPGA is 1.005 ns which is much less in comparison to MATLAB response time 
of 0.72 seconds. The maximum frequency support in the design is reported as 315 MHz. The comparative 
performance for the chip estimates that FPGA targeted simulation in Xilinx provided optimal delay in 
comparison to MATLAB response time. 
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